Multi-accented Mandarin Database Construction and Benchmark Evaluations

نویسندگان

  • Xiang Yan
  • Lei He
  • Pei Ding
  • Rui Zhao
  • Jie Hao
چکیده

In this paper, we describe the designing, recording and checking procedures of a multi-accented Mandarin speech database, and present benchmark evaluation of this database. The database was recorded in 6 cities in China, containing 1200 speakers’ accented Mandarin speech of continuous digits, isolated words and sentences. In total, 520k utterances (572.5 hours) were collected. We perfrom the intra-accent and cross-accent evaluations, together with the evaluation of a multi-accented acoustic model trained from the whole database. The database is a phonetically rich, gender-balanced and accent-balanced database, which could serve as the basic material for accented Mandarin recognition research, and it could also be used for creating real automatic speech recognition products for users with different accents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study on Tone Realization in Cantonese-Accented Mandarin and Standard Mandarin

The paper investigates tone realization in monosyllabic and disyllabic words in Cantonese, Cantonese-spoken Mandarin and Standard Mandarin, focusing on the deviations of tone realization in Cantonese-spoken Mandarin from standard Mandarin in monosyllabic and disyllabic words. The analysis on monosyllabic and disyllabic tone patterns shows that most of the tone deviations in Cantonese-accented M...

متن کامل

Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors

It has been well-recognized that the accent has a great impact on the ASR of Chinese Mandarin, therefore, how to improve the performance on the accented speech has become a critical issue in this field. The attribute feature has been proven effective on modelling accented speech, resulting in a significantly improved performance in accent recognition. In this paper, we propose an attribute-base...

متن کامل

Robust automatic speech recognition for accented Mandarin in car environments

This paper addresses the issues of robust automatic speech recognition (ASR) for accented Mandarin in car environments. A robust front-end is proposed, which adopts a Minimum Mean-Square Error (MMSE) estimator to suppress the background noise in frequency domain, and then implements spectrum smoothing both in time and frequency index to compensate those spectrum components distorted by the nois...

متن کامل

Where does interlanguage speech intelligibility benefit come from: Shared phonological knowledge or exposure to accented speech

Previous studies in interlanguage speech intelligibility benefit (ISIB) did not separate the effects of shared knowledge of L1 in non-native talkers from those of listeners through extensive exposure to accented L2 speech, which is crucial to the mechanism underlying ISIB. This preliminary study attempts to tease apart the two by comparing perception accuracy of Mandarin-accented English words ...

متن کامل

Development of a multi-tiered speech annotation system for Modeling Accented English

This paper discusses methodological issues in the development of a multitiered, phonetic annotation system, intended to capture pronunciation variation in the speech of second language learners and to serve in construction of a data base for training ASR models to recognize major pronunciation variants in the assessment of accented English.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006